Weaving Intranet Relations - Managing Web Content

نویسندگان

  • Ulrich Bohnacker
  • Lars Dehning
  • Jürgen Franke
  • Ingrid Renz
  • René Schneider
چکیده

We will give an overview of the WIR (Weaving Intranet Relations) system, a tool for offline computation and online retrieval of similar intranet documents. Text similarities are computed from a large collection of HTMLDocuments and represented in a similarity matrix. With a click on a "What’s Related?"-button, the user starts an intranet query for comparable documents with the full text as query input and receives a list of similar texts ranked by their corresponding similarity. The system is fully implemented and integrated into the intranet of a major company. 1 Motivation Currently intranets play an important role in large companies and their impact as knowledge management tools on efficient business communication and administration will grow considerably in the future. Thus at an already early stage, the intranet collection of texts, images, videos etc. might turn into a confusing pell-mell. Very often people in large companies do not know that colleagues in another department have the know-how they need to solve their problems. In this case, queries for related documents in the intranet may be a big help for the processes of finding and the integration of documents. For effective retrieval and maintenance in these steadily growing intranets, considerable approaches have been developed (Agosti, Crestani, Melucci 1994, Allan 1995, Salton, Singhal, Buckley, Mitra 1996, Shin, Nam 1997), but further supporting tools are still needed. We developed a new tool: Weaving Intranet Relations WIR which basically gives an innovative retrieval function. Additionally, it is able to support the organization and maintenance of the web content by suggesting new structures and links. 1.1 Retrieval For any common user of huge text collections as the I*nets (Internet, intranet, extranet) the main work consists in finding the relevant information. This search usually consists in looking for some user-defined keywords. Every text which contains these words is presented to the user as a good candidate according to the inquiry. But the main problem is that the user must know which keywords the author wrote. Especially in new and innovative areas, this is a difficult task. In short term queries, several natural language phenomena do not lead to satisfying results, e.g. whenever synonymy (a monitor in one text may be called a display in another document and CRT in another), homonymy (Java might be mentioned in a travel report of the executive chair or the name of a programming language) and polysemy (table might be a piece of furniture or an instance of assembling to eat) might play a role. These are major reasons why a full-text oriented search leads more often to better results and is especially preferable whenever the amount of the data collection is estimable and similarity computation is feasible within an acceptable time frame. Due to this reason we have chosen a deductive way of computing text similarity from a large collection of intranet documents. Based on our WIR technology we developed a system that presents to a given text (i.e. query) thematically related documents not according to some words or concepts, but according to the whole given text. Here, the WIR system is started by a button named „What’s Related?“. Clicking this button is the only action the reader has to perform to get a list of thematically related texts. 1.2 Maintenance Not only the reader/searcher/surfer but also the writer or manager needs better support for the intranet. Usually, the documents should be organized according to a given classification scheme and annotated with pre-defined

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Integration of the World Wide Web and Intranet Data Resources

The explosive growth in the volume of information available on the Web and in enterprise databases continues unabated. Managing these large quantities of information remains a challenge for both government and industry. TRW’s Digital Media Systems Lab has developed a research platform, InfoWeb, that can be described as an “information infrastructure” that provides seamless access to Web search ...

متن کامل

Developments in Practice VIII: Enterprise Content Management

Enterprise content management (ECM) is an integrated approach to managing all of an organization’s information including paper documents, data, reports, web pages, and digital assets. ECM includes the strategies, tools, processes, and skills an organization needs to manage its information assets over their lifecycle. While many vendors would suggest that their software is a panacea, most knowle...

متن کامل

A Secure and Transparent Firewall Web Proxy

The LANL transparent web proxy lets authorized external users originating from the Internet to securely access internal intranet web content and applications normally blocked by a firewall. Unauthenticated access is still, of course, denied. The proxy is transparent in that no changes to browsers, user interaction, or intranet web servers are necessary. The proxy, a few thousand lines of C runn...

متن کامل

A Process-oriented Framework for Efficient Intranet Management

At present a lot of organizations are developing and deploying Intranets to improve the internal communication and to facilitate the distribution of information. Frequently the Intranet solution is not integrated in the business processes of the organization and responsibilities both for the operation of the necessary infrastructure and for the Intranet content are not clearly defined. This can...

متن کامل

Network Documentation: A Web-Based Relational Database Approach

Every organization managing a network of computers has a need to organize, maintain, and access information related to the network. Users at various levels of the organization need quick and convenient access to this critical information at all times. We propose a methodology which is Web-based and which uses relational databases at the back end to store and organize this information. We envisi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000